Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge music branch to develop. #56

Merged
merged 82 commits into from
Nov 14, 2024
Merged

Merge music branch to develop. #56

merged 82 commits into from
Nov 14, 2024

Conversation

vlachvojta
Copy link
Collaborator

@vlachvojta vlachvojta commented Nov 3, 2023

Summary of changes in pull request #56 music->develop

  1. YOLO + Non-text regions
  2. Export + refactor of AltoXML and PageXML
  3. Music (optical music recognition)
  4. Changes in .ini config for parse_folder.py

YOLO + Non-text regions

  • Added LayoutExtractorYolo as a layout parser capable of using YOLO model to detect "non-text" regions (e.g. images, tables, etc.)
  • Added category attribute to RegionLayout and TextLine storing YOLO output classes or text for original layout parsers)
  • To use LayoutExtractorYolo and YOLO inference, you have to install ultralytics library manually (as it creates library version problems when imported with other libraries)

See code in: pero_ocr/document_ocr/page_parser.py/LayoutExtractorYolo

Export + refactor of AltoXML and PageXML

  • Added Enum for ALTO versions in (different baseline format)
    • in v2.x baseline is only float (mean of Y baseline coords)
    • in v4.4 baseline is a string in x1,y1 x2,y2 ... format (same as PageXML)
  • moved exporting and importing functions to specific classes (TextRegion + TextLine) for better readability and maintainability.

See code in: pero_ocr/core/layout.py/*.(to|from)_(page|alto)xml()

Music (optical music recognition)

  • Independent scripts to work with transcription of music staff from PageLayout. In music folder.
  • Simple CLI for music exporting from PageLayout to MusicXML and/or MIDI in user_scripts/export_music.py

Changes in .ini config for parse_folder.py

Section names

Allow user to define more than one Layout parser, Line Cropper and OCR sections with new format:

  • LAYOUT_PARSER_\d+ or LAYOUT_PARSER (updated)
  • LINE_CROPPER_\d+ or LINE_CROPPER
  • OCR_\d+ or OCR

See code in: pero_ocr/document_ocr/page_parser.py/PageParser.init_config_sections()`

Execution order example
Sections are executed in alphabetical order (first executing all Layout parsers and then pairs of Line Cropper with corresponding OCR).

LAYOUT_PARSER_1
LAYOUT_PARSER_2
LAYOUT_PARSER_3
LINE_CROPPER_1
LINE_CROPPER_2
OCR_1

OCR_2
sort
==>
LAYOUT_PARSER_1
LAYOUT_PARSER_2
LAYOUT_PARSER_3
LINE_CROPPER_1
OCR_1
LINE_CROPPER_2

OCR_2

See code in: pero_ocr/document_ocr/page_parser.py/PageParser.process_page()`

New attributes

LINE_CATEGORIES (list, [] by default) attribute for LAYOUT_PARSER:

  • For these categories (+ 'text' by default or [] for all)
  • After creating RegionLayout object (by LayoutExtractorYolo) of some category, create also TextLine objects with the same category (otherwise leave empty RegionLayout with no TextLine objects).
  • Used for creating TextLine objects for music regions but doesn't make sense for non-text regions (e.g. images) as they don't have transcriptions.

See code in: pero_ocr/document_ocr/page_parser.py/LayoutExtractorYolo.process_page()

CATEGORIES (list, [] by default) attribute for LINE_CROPPER and OCR sections:

  • Apply LINE_CROPPER and OCR engines only on TextLine objects with these categories (or [] for all)
  • Filtered and then merged in every process_page call using split_page_layout_by_categories and merge_page_layouts

See code in: pero_ocr/layout_engines/layout_helpers.py/split_page_layout_by_categories()

SUBSTITUTE_OUTPUT (bool, yes by default) attribute for OCR section:

  • If yes, substitute output of OCR engine using output_substitution_table in OCR_JSON (OCR engine configuration) using dictionary substitution key->value.
  • symbols are the line split by whitespaces

See code in: pero_ocr/document_ocr/page_parser.py/PageOCR.substitute_transcriptions()

SUBSTITUTE_OUTPUT_ATOMIC (bool, no by default) attribute for OCR section:

  • If yes, translation is done in atomic way on a page level: either all lines are translated or none.
  • If no, translation is done in best-effort way: lines are translated independently and if some line fails, it is left untranslated.

See code in: pero_ocr/document_ocr/page_parser.py/PageOCR.substitute_transcriptions()

UPDATE_TRANSCRIPTION_BY_CONFIDENCE (bool, no by default) attribute for OCR section:

  • If yes, update line transcription only if the new transcription has higher confidence.
  • If no, update transcription always.
  • Can be used for transcribing lines with multiple OCR engines to get the best result.

See code in: pero_ocr/document_ocr/page_parser.py/PageOCR.process_page()

... see commit messages for more ...

…ed labels (model output) to more verbose format usable by `export_music.py`.
…to define settings and `page_parser.py` to create music exporter object of `music/export_music/ExportMusicPage`.
…t.py`. Get names only from Yolo `result.names`.
… text Layout engine to work only with 'text' lines.
…h with its own setting and set of categories to work with.
# Conflicts:
#	pero_ocr/core/layout.py
@vlachvojta vlachvojta requested a review from ikiss-fit November 3, 2023 17:02
Parameter sets if PageOCR should update to new line:
 - every time (false)
 - only if better confidence (true)

Applies in case of rerunning OCR on previously transcribed line)
@vlachvojta
Copy link
Collaborator Author

vlachvojta commented Jun 14, 2024

TODOs from MartinK.

  • ALTO export baseline (all points, not just mean) using PointsType
    • ALTO version <4.2: original mean
    • ALTO version >=4.2: new PointsType
  • eval_ocr_pipeline_xml.py
    • what is wrong_order in eval_ocr_pipeline_xml.py - wrong order of lines IN REGION (are mapped lines' successors in region.lines also mapped?) (mapping is done by levhenstein distance here)
    • test "wrong order numbers" with naive sorter - very similar
    • confidence distance stats
      • kód hotov, statisticky jsou rozdíly stejné jako znovu projití stejným kódem

Results from eval_ocr_pipeline_xml.py:

develop vs develop second run develop vs music branch
{
  "gt_sum_char": 435306,
  "gt_lines_count": 15084,
  "good_lines": 12976,
  "unmapped_gt_lines": 0,
  "unmapped_gt_chars": 0,
  "unmapped_input_lines": 1,
  "unmapped_input_chars": 0,
  "mapped_char_errors": 32,
  "wrong_order": 2108,
  "wrong_line_transcriptions": 0,
  "good_order": 12976,
  "non_zero_confidence_distances_len": 32,
  "total_files": 99,
  "non_zero_confidence_distance_files": 8,
  "confidence_distances": {
    "max": 0.3460000000000001,
    "min": 0,
    "mean": 0.0002338933415536375,
    "median": 0,
    "std": 0.007068152540888958,
    "non_zero_count": 32,
    "total": 12976
  }
}
{
  "gt_sum_char": 435306,
  "gt_lines_count": 15084,
  "good_lines": 13070,
  "unmapped_gt_lines": 0,
  "unmapped_gt_chars": 0,
  "unmapped_input_lines": 0,
  "unmapped_input_chars": 0,
  "mapped_char_errors": 3,
  "wrong_order": 2014,
  "wrong_line_transcriptions": 0,
  "good_order": 13070,
  "non_zero_confidence_distances_len": 9,
  "total_files": 99,
  "non_zero_confidence_distance_files": 5,
  "confidence_distances": {
    "max": 0.33499999999999996,
    "min": 0,
    "mean": 5.9066564651874526e-05,
    "median": 0,
    "std": 0.004118486144981323,
    "non_zero_count": 9,
    "total": 13070
  }
}

- Versions older than 4.2 defines baseline as a simple float. (that's where the original baseline comes from)
- version 4.2 and never defines baseline as a PointsType string with recommend format: "x1,y1 x2,y2 ..."
…port options.

- Versions older than 4.2 defines baseline as a simple float. (baseline is exported as mean of all Y baseline points)
- version 4.2 and never defines baseline as a PointsType string with recommend format: "x1,y1 x2,y2 ..."
Old XMLs on input don't have category => line.category = None, OCR (and others) have to be set to `[]` by default to process ALL PAGES.
1) Remove `ultralytics` and `music21` from dependencies for the whole projest. the user will have to install them when needed.
2) Import `ultralytics` only when needed, so it doesn't create import error for specific numpy versions.

Ultralytics has this dependency right now: "numpy>=1.23.5,<2.0.0". See current at [github.com/ultralytics/ultralytics/blob/main/pyproject.toml](https://github.com/ultralytics/ultralytics/blob/69cfc8aa228dbf1267975f82fcae9a24665f23b9/pyproject.toml#L67)
@ikiss-fit
Copy link
Contributor

@vlachvojta I have found few bugs that shold be resolved:

  • The condition in layout_engines/smart_sorter.py:286 is wrong: It causes problems when the PageLayout doesn't contain any region of the category the sorter is sorting - it returns empty PageLayout. The condition should be "if the splitted PageLayout contains at least one region then the sorting happens".
  • The lengths variable in music/music_structures.py:485 should probably be np.array since it is then used that way (line 499)
  • MusicExporter is probably missing a condition when adding line to the overall music because in some cases I get following exception:
  File "/home/ikiss/projects/pero/pero-demo/ocr_pipeline.py", line 69, in process_file
    self.music_exporter.process_page(page_layout)
  File "/mnt/matylda1/ikiss/pero/pero-ocr.omr/pero_ocr/music/music_exporter.py", line 59, in process_page
    self.export_page_layout(page_layout, page_layout.id)
  File "/mnt/matylda1/ikiss/pero/pero-ocr.omr/pero_ocr/music/music_exporter.py", line 64, in export_page_layout
    parts = self.regions_to_parts(
  File "/mnt/matylda1/ikiss/pero/pero-ocr.omr/pero_ocr/music/music_exporter.py", line 120, in regions_to_parts
    part.add_textline(line)
  File "/mnt/matylda1/ikiss/pero/pero-ocr.omr/pero_ocr/music/music_exporter.py", line 230, in add_textline
    new_measures_encoded = encode_measures(new_measures, len(self.measures) + 1)
  File "/mnt/matylda1/ikiss/pero/pero-ocr.omr/pero_ocr/music/music_exporter.py", line 302, in encode_measures
    measures_encoded.append(measure.encode_to_music21())
  File "/mnt/matylda1/ikiss/pero/pero-ocr.omr/pero_ocr/music/music_structures.py", line 129, in encode_to_music21
    self.repr = self.encode_to_music21_polyphonic()
  File "/mnt/matylda1/ikiss/pero/pero-ocr.omr/pero_ocr/music/music_structures.py", line 174, in encode_to_music21_polyphonic
    voices_repr = [voice.encode_to_music21_monophonic() for voice in voices]
  File "/mnt/matylda1/ikiss/pero/pero-ocr.omr/pero_ocr/music/music_structures.py", line 174, in <listcomp>
    voices_repr = [voice.encode_to_music21_monophonic() for voice in voices]
  File "/mnt/matylda1/ikiss/pero/pero-ocr.omr/pero_ocr/music/music_structures.py", line 477, in encode_to_music21_monophonic
    self.repr.append(group.encode_to_music21_monophonic())
  File "/home/ikiss/venvs/pero-demo/lib/python3.10/site-packages/music21/stream/base.py", line 2637, in append
    self.coreGuardBeforeAddElement(e)
  File "/home/ikiss/venvs/pero-demo/lib/python3.10/site-packages/music21/stream/core.py", line 440, in coreGuardBeforeAddElement
    raise StreamException(
music21.exceptions21.StreamException: The object you tried to add to the Stream, None, is not a Music21Object.  Use an ElementWrapper object if this is what you intend.

vlachvojta and others added 9 commits August 5, 2024 13:39
In `smart_sorter.py`:
- if less then to engines filtered, return original page_layout and not only the split one.

In `music structures.py`:
- change type of `lengths` to numpy array, fix min_length to take from numbers and not names.
- ensure `encoded_group` is not None before appending it to the voice.

full comment: [pero-ocr/pull/56/#issuecomment-2245202776](#56)
…whole region to positive or negative (ignore categories of lines inside the region)
Export multirest as a simple default 'whole' rest.
…fidence -- in case when there are no logits (i.e. logits.shape[0] == 0) the confidence cannot be calculated.
@ikiss-fit ikiss-fit merged commit 02e3d7a into develop Nov 14, 2024
@ikiss-fit ikiss-fit deleted the music branch November 14, 2024 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Music pull request feedback Add region categories
3 participants